As the size of the dataset used in deep learning tasks increases, the noisy label problem, which is a task of making deep learning robust to the incorrectly labeled data, has become an important task. In this paper, we propose a method of learning noisy label data using the label noise selection with test-time augmentation (TTA) cross-entropy and classifier learning with the NoiseMix method. In the label noise selection, we propose TTA cross-entropy by measuring the cross-entropy to predict the test-time augmented training data. In the classifier learning, we propose the NoiseMix method based on MixUp and BalancedMix methods by mixing the samples from the noisy and the clean label data. In experiments on the ISIC-18 public skin lesion diagnosis dataset, the proposed TTA cross-entropy outperformed the conventional cross-entropy and the TTA uncertainty in detecting label noise data in the label noise selection process. Moreover, the proposed NoiseMix not only outperformed the state-of-the-art methods in the classification performance but also showed the most robustness to the label noise in the classifier learning.
translated by 谷歌翻译
Uncertainty estimation of the trained deep learning network provides important information for improving the learning efficiency or evaluating the reliability of the network prediction. In this paper, we propose a method for the uncertainty estimation for multi-class image classification using test-time mixup augmentation (TTMA). To improve the discrimination ability between the correct and incorrect prediction of the existing aleatoric uncertainty, we propose the data uncertainty by applying the mixup augmentation on the test data and measuring the entropy of the histogram of predicted labels. In addition to the data uncertainty, we propose a class-specific uncertainty presenting the aleatoric uncertainty associated with the specific class, which can provide information on the class confusion and class similarity of the trained network. The proposed methods are validated on two public datasets, the ISIC-18 skin lesion diagnosis dataset, and the CIFAR-100 real-world image classification dataset. The experiments demonstrate that (1) the proposed data uncertainty better separates the correct and incorrect prediction than the existing uncertainty measures thanks to the mixup perturbation, and (2) the proposed class-specific uncertainty provides information on the class confusion and class similarity of the trained network for both datasets.
translated by 谷歌翻译
Generally, regularization-based continual learning models limit access to the previous task data to imitate the real-world setting which has memory and privacy issues. However, this introduces a problem in these models by not being able to track the performance on each task. In other words, current continual learning methods are vulnerable to attacks done on the previous task. We demonstrate the vulnerability of regularization-based continual learning methods by presenting simple task-specific training time adversarial attack that can be used in the learning process of a new task. Training data generated by the proposed attack causes performance degradation on a specific task targeted by the attacker. Experiment results justify the vulnerability proposed in this paper and demonstrate the importance of developing continual learning models that are robust to adversarial attack.
translated by 谷歌翻译
具有对比目标的训练前视觉模型已显示出令人鼓舞的结果,这些结果既可以扩展到大型未经切割的数据集,又可以传输到许多下游应用程序。以下一些作品针对提高数据效率,通过添加自学意义来提高数据效率,但是在这些作品中的单个空间上定义了对比度损失(图像文本)对比度损失和内域(图像图像)对比度损失,因此许多可行的可行性监督的组合被忽略了。为了克服这个问题,我们提出了Uniclip,这是对对比语言图像预训练的统一框架。 Uniclip将域间对和域内对的对比损失整合到一个单一的通用空间中。 Uniclip的三个关键组成部分解决了整合不同域之间对比度损失时发生的差异:(1)增强感知功能嵌入,(2)MP-NCE损失和(3)域相似性度量。 Uniclip的表现优于以前的视觉语言预训练方法,在下游任务的各种单模式和多模式上。在我们的实验中,我们表明每个组成的分支都对最终性能有很好的贡献。
translated by 谷歌翻译
预训练的代表是现代深度学习成功的关键要素之一。但是,现有的关于持续学习方法的作品主要集中在从头开始逐步学习学习模型。在本文中,我们探讨了一个替代框架,以逐步学习,我们不断从预训练的表示中微调模型。我们的方法利用了预训练的神经网络的线性化技术来进行简单有效的持续学习。我们表明,这使我们能够设计一个线性模型,其中将二次参数正则方法作为最佳持续学习策略,同时享受神经网络的高性能。我们还表明,所提出的算法使参数正则化方法适用于类新问题。此外,我们还提供了一个理论原因,为什么在接受跨凝结损失训练的神经网络上,现有的参数空间正则化算法(例如EWC表现不佳)。我们表明,提出的方法可以防止忘记,同时在图像分类任务上实现高连续的微调性能。为了证明我们的方法可以应用于一般的持续学习设置,我们评估了我们在数据收入,任务收入和课堂学习问题方面的方法。
translated by 谷歌翻译
非本地(NL)块是一个流行的模块,它展示了模拟全局上下文的功能。但是,NL块通常具有沉重的计算和记忆成本,因此将块应用于高分辨率特征图是不切实际的。在本文中,为了研究NL块的功效,我们经验分析了输入特征向量的大小和方向是否正确影响向量之间的注意力。结果表明,SoftMax操作的效率低下,该操作通常用于将NL块的注意力图归一化。通过软磁性操作归一化的注意力图极大地依赖于关键向量的大小,并且如果删除幅度信息,则性能将退化。通过用缩放系数替换SoftMax操作,我们证明了CIFAR-10,CIFAR-100和TININE-IMAGENET的性能提高。此外,我们的方法显示了嵌入通道减少和嵌入重量初始化的鲁棒性。值得注意的是,我们的方法在没有额外的计算成本的情况下使多头注意力可用。
translated by 谷歌翻译
当将高分辨率(HR)图像降低到低分辨率(LR)图像中时,该图像将失去一些现有信息。因此,多个HR图像可以对应于LR图像。大多数现有方法都不考虑由随机属性引起的不确定性,这只能从概率上推断出来。因此,预测的HR图像通常是模糊的,因为网络试图反映单个输出图像中的所有可能性。为了克服这一限制,本文提出了一种新颖的面部超分辨率(SR)方案,以通过随机建模来探讨不确定性。具体而言,LR图像中的信息分别编码为确定性和随机属性。此外,提出了一个输入条件属性预测因子并分别训练,以预测仅从LR图像的部分生存的随机属性。广泛的评估表明,所提出的方法成功地降低了学习过程中的不确定性,并优于现有的最新方法。
translated by 谷歌翻译
Recent self-supervised video representation learning methods focus on maximizing the similarity between multiple augmented views from the same video and largely rely on the quality of generated views. However, most existing methods lack a mechanism to prevent representation learning from bias towards static information in the video. In this paper, we propose frequency augmentation (FreqAug), a spatio-temporal data augmentation method in the frequency domain for video representation learning. FreqAug stochastically removes specific frequency components from the video so that learned representation captures essential features more from the remaining information for various downstream tasks. Specifically, FreqAug pushes the model to focus more on dynamic features rather than static features in the video via dropping spatial or temporal low-frequency components. To verify the generality of the proposed method, we experiment with FreqAug on multiple self-supervised learning frameworks along with standard augmentations. Transferring the improved representation to five video action recognition and two temporal action localization downstream tasks shows consistent improvements over baselines.
translated by 谷歌翻译
随着3D扫描技术的发展,3D视觉任务已成为一个流行的研究区域。由于由传感器获得的大量数据,无监督的学习对于理解和利用点云而没有昂贵的注释过程至关重要。在本文中,我们提出了一种新颖的框架和一个名为“PSG-Net”的有效自动编码器架构,用于重建基于点云的学习。与使用固定或随机2D点使用的现有研究不同,我们的框架为潜在集合生成输入依赖的点亮功能。 PSG-Net使用编码输入来通过种子生成模块产生点明智的特征,并通过逐渐应用种子特征传播模块逐渐增加分辨率的多个阶段中提取更丰富的特征。我们通过实验证明PSG-Net的有效性; PSG-Net显示了点云重建和无监督分类的最先进的性能,并在监督完成中实现了对应于对应方法的可比性。
translated by 谷歌翻译
当应用于具有特定相机失真的新方案时,在无失真的数据集上培训的现有3D人类姿态估计算法遭受了性能下降。在本文中,我们提出了一种简单而有效的模型,用于视频中的3D人类姿势估计,通过利用MAML,基于代表优化的元学习算法可以快速适应任何失真环境。我们考虑一个特定失真的一系列2D关键点作为MAML的单一任务。但是,由于在扭曲的环境中没有大规模数据集,我们提出了一种有效的方法来从未置换的2D关数点生成合成扭曲数据。为了评估,我们假设两个实际测试情况,具体取决于运动捕获传感器是否可用。特别是,我们使用骨长对称性和一致性提出推理阶段优化。广泛的评估表明,我们所提出的方法在测试阶段成功地适应各种变形,并且优于现有的最先进的方法。所提出的方法在实践中是有用的,因为它不需要在测试设置中的相机校准和附加计算。
translated by 谷歌翻译